Reorg and new custom decoder support #177

oschwald · 2025-06-22T21:20:05Z

Move decoder code into new internal package
Add change log
Make functional option functions return a function
Add support for options to Open and FromBytes
Add missing docs
Tighten up the golangci-lint config
Start decoupling the decoding and relection
Move size checks to DataDecoder
Improve naming of data types and make them public
Add separate Decoder type for manual decoding
Add support for UnmarshalMaxMinDB

This will be used in the future for things like cache configuration. Adding now as it is a breaking change.

Although this doesn't provide immediate benefit, the intent is to make the data decoder available separately in a future commit for public use.

This further decouples the code. There are probably future improvements for reducing redundancy however.

This is largely based on #91.

Extend the UnmarshalMaxMindDB interface to work recursively with nested types, matching the behavior of encoding/json's UnmarshalJSON. Custom unmarshalers are now called for: - Struct fields that implement Unmarshaler - Pointer fields (creates value if nil, then checks for Unmarshaler) - Slice elements that implement Unmarshaler - Map values that implement Unmarshaler This enhancement allows for more flexible custom decoding strategies in complex data structures, improving performance optimization opportunities for nested types.

Add PeekType() method to Decoder that returns the type of the current value without consuming it, similar to jsontext.Decoder.PeekKind(). This enables look-ahead parsing for conditional decoding logic. The method follows pointers to return the actual data type being pointed to rather than just returning TypePointer.

The Decoder and Unmarshaler types are now available in the mmdbdata package for applications that need direct access to the decoding API.

Renames Type* constants to Kind* and PeekType() to PeekKind() throughout the codebase to match Go's encoding/json/v2 naming conventions. This improves API consistency with the standard library.

Renames all Decoder methods from Decode* to Read* (e.g., DecodeString to ReadString) to match Go's encoding/json/v2 jsontext.Decoder naming conventions. This improves API consistency with the standard library.

Adds DecoderOption type and variadic options parameter to enable future configuration without breaking API changes. Follows existing library patterns like ReaderOption and NetworksOption.

Improved error messages to include byte offset information and, for the reflection-based API, path information for nested structures using JSON Pointer format. For example, errors may now show "at offset 1234, path /city/names/en" or "at offset 1234, path /list/0/name" instead of just the underlying error message. The implementation maintains zero allocation on the happy path through retroactive path building during error unwinding.

Replaces unbounded cache with fixed 512-entry array using offset-based indexing. Provides 15% performance improvement while preventing memory growth and ensuring thread safety for concurrent reader usage.

Move size validation logic to DataDecoder to eliminate duplication and create single source of truth for data validation.

And make them more accurate.

- Add comprehensive field documentation to Metadata struct with references to MaxMind DB specification - Add BuildTime() convenience method to convert BuildEpoch to time.Time - Enhance Verify() documentation explaining validation scope and use cases - Add ExampleReader_Verify showing verification and metadata access - Add TestMetadataBuildTime to verify BuildTime() method correctness

Add makeTestName helper function to create reasonable test names from long hex strings. TestDecodeByte and TestDecodeString now show concise names like '9e06b37878787878...7878' instead of extremely long hex strings that made test output unreadable.

Remove the experimental deserializer interface and all supporting code: - Delete deserializer.go interface definition - Delete deserializer_test.go test file - Remove deserializer support from reflection.go - Remove deserializer methods from data_decoder.go - Remove unused math/big import from data_decoder.go - Add breaking change notice to changelog recommending UnmarshalMaxMindDB

Error messages for type mismatches now display readable type names like 'Map' and 'Slice' instead of numeric codes, making debugging easier.

Make StringCache, buffer access, InternAt, and all DataDecoder methods package-private since they are only used within the decoder package. Keeps Kind methods public as they are exposed via mmdbdata type alias.

Replaces global mutex with per-entry mutexes to reduce allocation count from 33 to 10 per operation in downstream libraries while maintaining thread safety and good concurrent performance.

Add BenchmarkCityLookupConcurrent to demonstrate string cache performance improvements under concurrent load. Tests 1, 4, 16, and 64 goroutines performing realistic city lookups, providing clear metrics for concurrent scaling behavior.

Use direct type assertion instead of reflection-based interface checking in Decode method for better performance.

Replace nodeReader interface with specialized traverseTree functions for each record size. Eliminates interface dispatch overhead and implements branchless offset calculations for improved performance.

Replace reflect.Type.Implements() with type assertion using comma-ok idiom and remove redundant pointer interface check. The recursive decode call handles pointer receiver implementations via CanAddr(). Eliminates unmarshalerType variable and reduces code complexity while maintaining identical functionality and performance.

Remove inaccurate 15% performance improvement claim that was contradicted by benchmark testing. Add missing BREAKING CHANGE label for network options API changes.

This commit implements encoding/json/v2 style field precedence rules for struct field resolution, replacing the previous two-phase processing with a single-phase approach that properly handles field conflicts using depth-based and tag-based precedence. Key changes: - Replace fieldsType.anonymousFields with fieldInfo metadata structure - Implement breadth-first traversal for field collection with depth tracking - Add support for embedded pointer types (*EmbeddedStruct) - Apply json/v2 precedence rules: shallow beats deep, tagged beats untagged - Use single-phase processing with FieldByIndex for embedded field access - Initialize nil embedded pointers during field traversal Precedence rules applied: 1. Shallowest embedding depth wins 2. Among same depth, explicitly tagged field wins over untagged 3. Among same depth and tag status, first declared wins Fixes embedded pointer field access that was causing nil pointer dereferences in complex nested structures.

Adds basic validation for maxminddb struct tags inspired by encoding/json/v2's tag validation approach. Currently validates: - UTF-8 encoding of tag values - Provides foundation for future tag validation improvements The validation is designed to be non-intrusive - validation errors are currently ignored to maintain backward compatibility, but the infrastructure is in place for future enhancements. This follows the json/v2 pattern of catching obvious user mistakes while being permissive about edge cases that might be legitimate.

Implement field index reindexing and addressable value wrapper to reduce reflection overhead. Split field indices for faster access and eliminate redundant bounds checks during field traversal. Based on encoding/json/v2 optimizations for better performance in struct field access patterns.

Optimizes the reflection-based decoder with more efficient field access patterns and reduced memory allocations. Eliminates duplicate code paths and unused functions. Performance improvement measured in geoip2-golang benchmarks.

Copilot

Pull Request Overview

This PR reorganizes the decoder implementation into internal packages, adds a new custom decoding API, and tightens error handling and configuration.

Moved all decoding logic into internal/decoder, exposing a high-level decoder.ReflectionDecoder and a public mmdbdata.Decoder factory.
Replaced legacy error functions with the internal/mmdberrors package for structured, contextual errors.
Refactored options to use functional option patterns and updated APIs (Reader.Open/FromBytes, Networks, Verify, Result.Decode), plus added missing docs, tests, and benchmarks.

Reviewed Changes

Copilot reviewed 37 out of 38 changed files in this pull request and generated 1 comment.

Show a summary per file

File	Description
verifier.go	Switched to use `ReflectionDecoder.VerifyDataSection` and `mmdberrors` for errors.
traverse_test.go	Updated test to call network options as functions (`IncludeAliasedNetworks()`).
traverse.go	Refactored `NetworksOption` funcs to return closures and tree traversal adjustments.
result.go	Simplified `Decode`/`DecodePath` to delegate to `decoder.ReflectionDecoder`.
reader_test.go	Fixed literal types, updated closed-DB error messages, added concurrent benchmark.
reader.go	Overhauled `Reader` to use `ReflectionDecoder`, introduced `ReaderOption`, enriched docs.
node.go	Removed – replaced by `readNodeBySize` in `traverse.go`.
mmdbdata/type.go	Added public type and factory aliases for custom decoding.
mmdbdata/interface.go	Introduced `Unmarshaler` interface for custom unmarshaling.
mmdbdata/doc.go	Added package documentation for `mmdbdata`.
internal/mmdberrors/errors.go	Defined `InvalidDatabaseError` and `UnmarshalTypeError` types.
internal/mmdberrors/context.go	Added `ContextualError`, `WrapWithContext`, and `PathBuilder`.
internal/decoder/verifier.go	Moved `verifyDataSection` implementation into `ReflectionDecoder`.
internal/decoder/*	Moved core decoding, reflection logic, caching, and extensive tests under `internal/decoder`.

Comments suppressed due to low confidence (5)

reader.go:107

The code uses errors.New but the "errors" package is not imported. Add "errors" to the import block.

import (

reader.go:331

This call to errors.New will not compile because the errors package isn’t imported. Ensure errors is imported.

		return Result{err: errors.New("cannot call Lookup on a closed database")}

reader.go:362

This call to errors.New also requires importing the errors package. Add the import to fix the compile error.

		return Result{err: errors.New("cannot call LookupOffset on a closed database")}

reader_test.go:1033

You can’t range over an integer. Replace with a standard for-loop, e.g., for i := 0; i < numGoroutines; i++ {.

				for range numGoroutines {

reader_test.go:1042

Likewise, lookupsPerGoroutine is an integer, not a slice. Use for i := 0; i < lookupsPerGoroutine; i++ { instead.

						for range lookupsPerGoroutine {

Copilot · 2025-07-05T20:50:03Z

result.go

 type Result struct {
 	ip        netip.Addr
 	err       error
-	decoder   decoder
+	decoder   decoder.ReflectionDecoder


[nitpick] The field name decoder shadows the imported package name decoder. Consider renaming the field (e.g., refDecoder) to improve clarity.

Suggested change

decoder decoder.ReflectionDecoder

refDecoder decoder.ReflectionDecoder

Update ReadMap and ReadSlice to return collection size along with iterators, enabling efficient pre-allocation of maps and slices. Iterator remains the primary return value for natural usage patterns.

oschwald force-pushed the greg/decoder branch from 2acd825 to c18b036 Compare June 22, 2025 21:40

oschwald requested a review from Copilot June 22, 2025 21:46

This comment was marked as outdated.

Sign in to view

oschwald added 15 commits June 28, 2025 16:20

Move decoder code into new internal package

828da60

Add change log

a1dd40a

Make functional option functions return a function

e910ce2

Add support for options to Open and FromBytes

6137730

This will be used in the future for things like cache configuration. Adding now as it is a breaking change.

Add missing docs

d0b1b90

Tighten up the golangci-lint config

c99b49d

Start decoupling the decoding and relection

eb2699b

Although this doesn't provide immediate benefit, the intent is to make the data decoder available separately in a future commit for public use.

Move size checks to DataDecoder

0398bcd

This further decouples the code. There are probably future improvements for reducing redundancy however.

Improve naming of data types and make them public

40a9a76

Add separate Decoder type for manual decoding

e38dbaa

This is largely based on #91.

Add support for UnmarshalMaxMinDB

3304dbb

Improve go docs

8b7bd07

Update README.md

8c89e70

Add bounds check suggested by Copilot

b97e1e3

oschwald force-pushed the greg/decoder branch from 2bb951e to 2d8cb52 Compare June 28, 2025 23:20

oschwald force-pushed the greg/decoder branch 3 times, most recently from 652ee9d to 9e78231 Compare June 30, 2025 02:49

oschwald requested a review from Copilot June 30, 2025 02:52

This comment was marked as outdated.

Sign in to view

oschwald force-pushed the greg/decoder branch from d19914b to 3a91c2e Compare July 1, 2025 23:32

oschwald added 4 commits July 1, 2025 19:50

Move Decoder to public mmdbdata package

f50aa68

The Decoder and Unmarshaler types are now available in the mmdbdata package for applications that need direct access to the decoding API.

Rename Type to Kind to align with encoding/json/v2

fd67384

Renames Type* constants to Kind* and PeekType() to PeekKind() throughout the codebase to match Go's encoding/json/v2 naming conventions. This improves API consistency with the standard library.

Rename Decode methods to Read to align with jsontext

53894a8

Renames all Decoder methods from Decode* to Read* (e.g., DecodeString to ReadString) to match Go's encoding/json/v2 jsontext.Decoder naming conventions. This improves API consistency with the standard library.

Add options pattern to NewDecoder

8268b31

Adds DecoderOption type and variadic options parameter to enable future configuration without breaking API changes. Follows existing library patterns like ReaderOption and NetworksOption.

oschwald added 15 commits July 1, 2025 19:50

Add files and test databases to .gitignore

31e9bd1

Add thread-safe bounded string cache

d920bbf

Replaces unbounded cache with fixed 512-entry array using offset-based indexing. Provides 15% performance improvement while preventing memory growth and ensuring thread safety for concurrent reader usage.

Consolidate decoder validation to DataDecoder

8175e20

Move size validation logic to DataDecoder to eliminate duplication and create single source of truth for data validation.

Fixes for golangci-lint v2.2.0

9b0c370

Update UnmarshalMaxMindDB docs

a614e6c

And make them more accurate.

Fix minor documentation errors and typos

6fc7076

Test on recent Go versions

94982fe

Improve error messages to show type names instead of numbers

752d9a5

Error messages for type mismatches now display readable type names like 'Map' and 'Slice' instead of numeric codes, making debugging easier.

Make internal decoder exports package-private

4aa49e9

Make StringCache, buffer access, InternAt, and all DataDecoder methods package-private since they are only used within the decoder package. Keeps Kind methods public as they are exposed via mmdbdata type alias.

Reduce string allocation overhead in decoders

78992f1

Replaces global mutex with per-entry mutexes to reduce allocation count from 33 to 10 per operation in downstream libraries while maintaining thread safety and good concurrent performance.

Add concurrent city lookup benchmark

af09d4f

Add BenchmarkCityLookupConcurrent to demonstrate string cache performance improvements under concurrent load. Tests 1, 4, 16, and 64 goroutines performing realistic city lookups, providing clear metrics for concurrent scaling behavior.

oschwald force-pushed the greg/decoder branch from 3a91c2e to af09d4f Compare July 2, 2025 02:50

oschwald added 8 commits July 4, 2025 11:04

Reduce decoder reflection overhead

8230991

Use direct type assertion instead of reflection-based interface checking in Decode method for better performance.

Optimize tree traversal with specialized functions

4f8b5f8

Replace nodeReader interface with specialized traverseTree functions for each record size. Eliminates interface dispatch overhead and implements branchless offset calculations for improved performance.

Fix changelog typo and document breaking changes

2b2d500

Remove inaccurate 15% performance improvement claim that was contradicted by benchmark testing. Add missing BREAKING CHANGE label for network options API changes.

Improve MMDB decoding performance by 2.4%

92c2915

Optimizes the reflection-based decoder with more efficient field access patterns and reduced memory allocations. Eliminates duplicate code paths and unused functions. Performance improvement measured in geoip2-golang benchmarks.

oschwald requested a review from Copilot July 5, 2025 20:46

Copilot AI reviewed Jul 5, 2025

View reviewed changes

Add size return to ReadMap and ReadSlice methods

0719621

Update ReadMap and ReadSlice to return collection size along with iterators, enabling efficient pre-allocation of maps and slices. Iterator remains the primary return value for natural usage patterns.

oschwald force-pushed the greg/decoder branch from 152bf9f to 0719621 Compare July 5, 2025 20:52

oschwald merged commit 0719621 into main Jul 5, 2025
16 checks passed

oschwald deleted the greg/decoder branch July 5, 2025 20:53

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Reorg and new custom decoder support #177

Reorg and new custom decoder support #177

Uh oh!

oschwald commented Jun 22, 2025

Uh oh!

This comment was marked as outdated.

Uh oh!

This comment was marked as outdated.

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Jul 5, 2025

Uh oh!

Uh oh!

Uh oh!

	decoder decoder.ReflectionDecoder
	refDecoder decoder.ReflectionDecoder

Reorg and new custom decoder support #177

Reorg and new custom decoder support #177

Uh oh!

Conversation

oschwald commented Jun 22, 2025

Uh oh!

This comment was marked as outdated.

Uh oh!

This comment was marked as outdated.

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Copilot AI Jul 5, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!